Goto

Collaborating Authors

 mdp homomorphic network



MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

Neural Information Processing Systems

This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance constraint, we can reduce the size of the solution space. We specifically focus on group-structured symmetries (invertible transformations). Additionally, we introduce an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done. We construct MDP homomorphic MLPs and CNNs that are equivariant under either a group of reflections or rotations. We show that such networks converge faster than unstructured baselines on CartPole, a grid world and Pong.



Review for NeurIPS paper: MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

Neural Information Processing Systems

Additional Feedback: In the caption of figure 4 (b) I believe CNNs should be MLPs. In Figure captions it says that 25% 50% and 75% quantiles are shown but I only see one set of error bars. Line 142: Equation 9 should be Equation 8? Line 155: should this really be for all g given you are talking about a specific s \prime and a \prime? Is invertibility an assumption here? I can't immediately see why it should need to be so.


Review for NeurIPS paper: MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

Neural Information Processing Systems

The paper proposes an approach for incorporating knowledge about symmetries or equivariances into neural network policies by providing a general purpose method for constructing network layers based on knowledge of the relevant transformations. The reviews are generally positive: Identifying effective ways of incorporating prior knowledge of this type into neural networks is an important research challenge that is of interest to the community. The proposed approach for constructing network layers seems novel, although there is some prior work that explores ways of exploiting such knowledge in particular application domains, or via alternative means such as data augmentation. An important caveat of the submission, remarked upon by all reviewers is the experimental evaluation. It is currently limited to simple scenarios with perfect symmetries which provide limited evidence of the utility of the approach in more complex / less idealized scenarios.


MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

Neural Information Processing Systems

This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance constraint, we can reduce the size of the solution space. We specifically focus on group-structured symmetries (invertible transformations).


MDP Homomorphic Networks: Group Symmetries in Reinforcement Learning

arXiv.org Machine Learning

This paper introduces MDP homomorphic networks for deep reinforcement learning. MDP homomorphic networks are neural networks that are equivariant under symmetries in the joint state-action space of an MDP. Current approaches to deep reinforcement learning do not usually exploit knowledge about such structure. By building this prior knowledge into policy and value networks using an equivariance constraint, we can reduce the size of the solution space. We specifically focus on group-structured symmetries (invertible transformations). Additionally, we introduce an easy method for constructing equivariant network layers numerically, so the system designer need not solve the constraints by hand, as is typically done. We construct MDP homomorphic MLPs and CNNs that are equivariant under either a group of reflections or rotations. We show that such networks converge faster than unstructured baselines on CartPole, a grid world and Pong.